6,860 research outputs found
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
Spoof detection using time-delay shallow neural network and feature switching
Detecting spoofed utterances is a fundamental problem in voice-based
biometrics. Spoofing can be performed either by logical accesses like speech
synthesis, voice conversion or by physical accesses such as replaying the
pre-recorded utterance. Inspired by the state-of-the-art \emph{x}-vector based
speaker verification approach, this paper proposes a time-delay shallow neural
network (TD-SNN) for spoof detection for both logical and physical access. The
novelty of the proposed TD-SNN system vis-a-vis conventional DNN systems is
that it can handle variable length utterances during testing. Performance of
the proposed TD-SNN systems and the baseline Gaussian mixture models (GMMs) is
analyzed on the ASV-spoof-2019 dataset. The performance of the systems is
measured in terms of the minimum normalized tandem detection cost function
(min-t-DCF). When studied with individual features, the TD-SNN system
consistently outperforms the GMM system for physical access. For logical
access, GMM surpasses TD-SNN systems for certain individual features. When
combined with the decision-level feature switching (DLFS) paradigm, the best
TD-SNN system outperforms the best baseline GMM system on evaluation data with
a relative improvement of 48.03\% and 49.47\% for both logical and physical
access, respectively
Design and simulation of 1.28 Tbps dense wavelength division multiplex system suitable for long haul backbone
Wavelength division multiplex (WDM) system with on / off keying (OOK)
modulation and direct detection (DD) is generally simple to implement, less
expensive and energy efficient. The determination of the possible design
capacity limit, in terms of the bit rate-distance product in WDM-OOK-DD systems
is therefore crucial, considering transmitter / receiver simplicity, as well as
energy and cost efficiency. A 32-channel wavelength division multiplex system
is designed and simulated over 1000 km fiber length using Optsim commercial
simulation software. The standard channel spacing of 0.4 nm was used in the
C-band range from 1.5436-1.556 nm. Each channel used the simple non return to
zero - on / off keying (NRZ-OOK) modulation format to modulate a continuous
wave (CW) laser source at 40 Gbps using an external modulator, while the
receiver uses a DD scheme. It is proposed that the design will be suitable for
long haul mobile backbone in a national network, since up to 1.28 Tbps data
rates can be transmitted over 1000 km. A bit rate-length product of 1.28
Pbps.km was obtained as the optimum capacity limit in 32 channel dispersion
managed WDM-OOK-DD system.Comment: Accepted for publication in Journal of Optical Communications - De
Gruyte
- …